TAZ & Regional Median Income

Calculating Median Income for Transportation Analysis Zones (TAZ)

This notebook replicates the analysis from the ’_Source - TAZ & Regional Median Income - 2022-03-17.xlsb’ using the updated household and household income data from the 2019-2023 American Community Survey.
Author
Affiliation

Pukar Bhandari

Published

October 8, 2025

1 Environment Setup

Load Libraries

!conda install -c conda-forge numpy pandas geopandas matplotlib seaborn python-dotenv openpyxl
!pip install pygris
Show the code
# For Analysis
import numpy as np
import pandas as pd
import geopandas as gpd

# For Visualization
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FuncFormatter
import seaborn as sns

# Census data query libraries & modules
from pygris import block_groups, counties, states
from pygris.helpers import validate_state
from pygris.data import get_census

# misc
import datetime
import os
from pathlib import Path
import requests

from dotenv import load_dotenv
load_dotenv()
True

Environment Variables

Show the code
PROJECT_CRS = "EPSG:3566"  # NAD83 / UTM zone 12N
Tip

Need a Census API key? Get one for free at census.gov/developers.

Create a .env file in the project directory and add your Census API key: CENSUS_API_KEY=your-key-here This enables fetching US Census data from the Census API.

Show the code
# Set your API key into environment
os.environ['CENSUS_API_KEY'] = 'your_api_key_here'
Show the code
STATE_FIPS = validate_state("UT")
Using FIPS code '49' for input 'UT'

2 Fetch Raw Data and Load

Transportation Analysis Zone

Show the code
gdf_TAZ = gpd.read_file(
  r"_data/TAZ/WFv920_TAZ.shp"
).to_crs(PROJECT_CRS)

gdf_TAZ.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Consumer Price Index

Data Source: Consumer Price Index for All Urban Consumers (CPI-U) [Source: Bureau of Labor Statistics]

Show the code
# Ensure file exists
filepath_cpi = Path("_data/bls/r-cpi-u-rs-allitems.xlsx")
if not filepath_cpi.exists():
    filepath_cpi.parent.mkdir(parents=True, exist_ok=True)
    response = requests.get("https://www.bls.gov/cpi/research-series/r-cpi-u-rs-allitems.xlsx")
    filepath_cpi.write_bytes(response.content)

# Read Excel file directly from URL
df_CPI = pd.read_excel(
  filepath_cpi, # File path
  sheet_name="All items",
  usecols="A:N", # TODO: update cols later for new data
  skiprows=5,    # Skip the first two rows
  engine='openpyxl'  # Specify engine
)

# display the data
df_CPI
YEAR JAN FEB MAR APR MAY JUNE JULY AUG SEP OCT NOV DEC AVG
0 1977 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 100.0 NaN
1 1978 100.5 101.1 101.8 102.7 103.6 104.5 105.0 105.5 106.1 106.7 107.3 107.8 104.4
2 1979 108.7 109.7 110.7 111.8 113.0 114.1 115.1 116.0 117.1 117.9 118.5 119.5 114.3
3 1980 120.8 122.4 123.8 124.7 125.7 126.7 127.5 128.6 129.9 130.7 131.5 132.4 127.1
4 1981 133.6 135.2 136.3 137.1 137.9 138.7 139.7 140.7 141.8 142.4 142.9 143.4 139.1
5 1982 144.2 144.7 144.9 145.0 146.1 147.5 148.5 148.8 149.5 150.2 150.5 150.6 147.5
6 1983 151.0 151.1 151.2 152.4 153.2 153.7 154.3 154.8 155.6 156.0 156.1 156.3 153.8
7 1984 157.2 158.0 158.3 159.0 159.5 159.9 160.4 161.1 161.8 162.2 162.2 162.3 160.2
8 1985 162.5 163.2 163.9 164.6 165.2 165.6 165.9 166.2 166.8 167.2 167.7 168.1 165.6
9 1986 168.6 168.1 167.3 166.9 167.4 168.2 168.2 168.5 169.4 169.5 169.5 169.6 168.4
10 1987 170.6 171.3 172.0 172.9 173.4 174.0 174.3 175.3 176.1 176.5 176.6 176.5 174.1
11 1988 177.0 177.3 178.0 178.9 179.5 180.1 180.8 181.6 182.7 183.2 183.3 183.4 180.5
12 1989 184.3 184.9 185.9 187.2 188.1 188.5 189.0 189.2 189.8 190.6 190.9 191.1 188.3
13 1990 193.0 193.9 194.9 195.2 195.5 196.5 197.3 199.0 200.6 201.7 202.0 202.0 197.6
14 1991 202.9 203.1 203.3 203.5 204.1 204.5 204.7 205.2 206.1 206.2 206.7 206.8 204.8
15 1992 207.2 207.7 208.6 209.0 209.3 209.7 210.1 210.6 211.2 211.8 212.1 211.9 209.9
16 1993 212.6 213.3 214.0 214.5 215.0 215.2 215.3 215.7 216.0 216.7 216.9 216.7 215.2
17 1994 217.1 217.6 218.4 218.6 218.9 219.5 220.0 220.7 221.1 221.2 221.5 221.4 219.7
18 1995 222.2 222.9 223.6 224.3 224.7 225.2 225.3 225.7 226.1 226.7 226.5 226.4 225.0
19 1996 227.5 228.3 229.4 230.2 230.8 230.9 231.3 231.5 232.3 233.0 233.4 233.4 231.0
20 1997 234.1 234.7 235.2 235.5 235.4 235.8 235.9 236.3 237.1 237.5 237.4 237.0 236.0
21 1998 237.4 237.8 238.1 238.6 239.0 239.1 239.4 239.7 240.0 240.5 240.5 240.3 239.2
22 1999 240.9 241.2 241.9 243.6 243.6 243.7 244.4 245.1 246.2 246.7 246.8 246.8 244.2
23 2000 247.6 249.1 251.1 251.2 251.4 252.9 253.4 253.4 254.8 255.1 255.3 255.1 252.5
24 2001 256.8 257.9 258.5 259.5 260.5 261.1 260.3 260.4 261.4 260.6 260.1 259.1 259.7
25 2002 259.8 260.8 262.2 263.7 263.6 263.9 264.1 265.0 265.4 265.9 265.9 265.3 263.8
26 2003 266.4 268.6 270.2 269.5 269.1 269.5 269.7 270.7 271.6 271.4 270.6 270.3 269.8
27 2004 271.7 273.2 274.9 275.8 277.3 278.2 277.8 277.9 278.5 280.0 280.2 279.1 277.0
28 2005 279.6 281.3 283.5 285.4 285.2 285.2 286.5 288.0 291.5 292.2 289.8 288.6 286.4
29 2006 290.8 291.4 293.1 295.5 296.9 297.6 298.4 299.1 297.6 296.0 295.6 296.0 295.7
30 2007 296.9 298.5 301.2 303.1 305.0 305.6 305.5 304.9 305.8 306.4 308.3 308.1 304.1
31 2008 309.6 310.5 313.2 315.1 317.7 320.9 322.6 321.3 320.9 317.6 311.6 308.3 315.8
32 2009 309.7 311.2 312.0 312.8 313.7 316.4 315.8 316.6 316.8 317.1 317.3 316.7 314.7
33 2010 317.8 317.9 319.2 319.8 320.0 319.7 319.8 320.2 320.4 320.8 320.9 321.5 319.8
34 2011 323.0 324.6 327.8 329.9 331.5 331.1 331.4 332.3 332.8 332.2 331.9 331.1 330.0
35 2012 332.6 334.0 336.6 337.6 337.2 336.7 336.2 338.1 339.6 339.5 337.9 337.0 336.9
36 2013 338.0 340.8 341.7 341.3 341.9 342.8 342.9 343.3 343.8 342.9 342.2 342.2 342.0
37 2014 343.5 344.8 347.0 348.2 349.4 350.1 349.9 349.3 349.6 348.7 346.9 344.9 347.7
38 2015 343.4 344.9 346.9 347.7 349.4 350.7 350.7 350.2 349.7 349.5 348.8 347.6 348.3
39 2016 348.2 348.5 350.0 351.7 353.1 354.2 353.7 354.0 354.8 355.3 354.7 354.9 352.8
40 2017 356.9 358.0 358.3 359.4 359.7 360.0 359.8 360.9 362.8 362.5 362.6 362.3 360.3
41 2018 364.3 366.0 366.7 368.3 369.8 370.3 370.4 370.7 371.1 371.8 370.6 369.3 369.1
42 2019 370.0 371.5 373.6 375.6 376.4 376.5 377.2 377.1 377.5 378.4 378.2 377.8 375.8
43 2020 379.2 380.2 379.5 377.2 377.2 379.4 381.3 382.6 383.1 383.2 382.9 383.2 380.8
44 2021 384.9 387.1 390.0 393.2 396.7 400.5 402.4 403.1 404.2 407.6 409.5 410.8 399.2
45 2022 414.3 418.2 423.9 426.3 431.0 436.9 436.8 436.7 437.6 439.4 439.0 437.6 431.5
46 2023 441.1 443.6 445.0 447.3 448.4 449.9 450.7 452.7 453.8 453.6 452.7 452.3 449.3
47 2024 454.7 457.6 460.5 462.3 463.1 463.2 463.8 464.1 464.9 465.4 465.2 465.3 462.5

ACS_5YR_HouseholdIncome

Set Census Variables

Show the code
# Define variables to download
acs_variables = {
    'B19013_001E': 'HH_MED_INC',  # Median Household Income in the Past 12 Months (in 2023 Inflation-Adjusted Dollars)
    'B19013_001M': 'HH_MED_INC_MOE',  # Margin of Error for Median Household Income
    'B19001_001E': 'HH_TOTAL',  # Total Households
    'B19001_002E': 'HH_LT_10K',  # Less than $10,000
    'B19001_003E': 'HH_10_15K',  # $10,000 to $14,999
    'B19001_004E': 'HH_15_20K',  # $15,000 to $19,999
    'B19001_005E': 'HH_20_25K',  # $20,000 to $24,999
    'B19001_006E': 'HH_25_30K',  # $25,000 to $29,999
    'B19001_007E': 'HH_30_35K',  # $30,000 to $34,999
    'B19001_008E': 'HH_35_40K',  # $35,000 to $39,999
    'B19001_009E': 'HH_40_45K',  # $40,000 to $44,999
    'B19001_010E': 'HH_45_50K',  # $45,000 to $49,999
    'B19001_011E': 'HH_50_60K',  # $50,000 to $59,999
    'B19001_012E': 'HH_60_75K',  # $60,000 to $74,999
    'B19001_013E': 'HH_75_100K',  # $75,000 to $99,999
    'B19001_014E': 'HH_100_125K',  # $100,000 to $124,999
    'B19001_015E': 'HH_125_150K',  # $125,000 to $149,999
    'B19001_016E': 'HH_150_200K',  # $150,000 to $199,999
    'B19001_017E': 'HH_GT_200K'  # $200,000 or more
}

ACS_5YR_State

Show the code
# Fetch state boundaries from TIGER/Line shapefiles
gdf_ut_bound = states(
  year=2023,
  cache=True
).to_crs(PROJECT_CRS)

# Filter for Utah only
gdf_ut_bound = gdf_ut_bound[gdf_ut_bound['STATEFP'] == str(STATE_FIPS)]
Show the code
# Fetch Income data from ACS 5-year estimates for Utah
df_ut_income = get_census(
  dataset="acs/acs5",
  year=2023,
  variables=list(acs_variables.keys()),
  params={
      # "key": f"{os.getenv('CENSUS_API_KEY')}", # FIXME: This causes error
      "for": f"state:{STATE_FIPS}"
    },
    return_geoid=True,
    guess_dtypes=True
)
Show the code
# Join ACS data to block group boundaries and transform CRS
gdf_ut_income = gdf_ut_bound[['GEOID', 'STATEFP', 'NAME', 'geometry']].merge(df_ut_income, on = "GEOID").to_crs(PROJECT_CRS)
Show the code
# Rename columns
gdf_ut_income = gdf_ut_income.rename(columns=acs_variables)
Show the code
# Preview data
gdf_ut_income
GEOID STATEFP NAME geometry HH_MED_INC HH_MED_INC_MOE HH_TOTAL HH_LT_10K HH_10_15K HH_15_20K ... HH_35_40K HH_40_45K HH_45_50K HH_50_60K HH_60_75K HH_75_100K HH_100_125K HH_125_150K HH_150_200K HH_GT_200K
0 49 49 Utah POLYGON ((900313.399 6302435.171, 900580.099 6... 91750 634 1094896 33918 22999 22352 ... 33072 32113 35150 69627 104883 159368 134089 102926 124090 137560

1 rows × 23 columns

ACS_5YR_County

Show the code
# Fetch county boundaries from TIGER/Line shapefiles
gdf_county_bound = counties(
  state=STATE_FIPS,
  year=2023,
  cache=True
).to_crs(PROJECT_CRS)

# Fetch Income data from ACS 5-year estimates for counties in Utah
df_county_income = get_census(
  dataset="acs/acs5",
  year=2023,
  variables=list(acs_variables.keys()),
  params={
      # "key": f"{os.getenv('CENSUS_API_KEY')}", # FIXME: This causes error
      "for": f"county:*",
      "in": f"state:{STATE_FIPS}"
    },
    return_geoid=True,
    guess_dtypes=True
)

# Join ACS data to county boundaries and tranform CRS
gdf_county_income = gdf_county_bound[['GEOID', 'COUNTYFP', 'NAMELSAD', 'geometry']].merge(df_county_income, on = "GEOID").to_crs(PROJECT_CRS)

# Rename columns
gdf_county_income = gdf_county_income.rename(columns=acs_variables)

# Preview data
gdf_county_income.head(10)
GEOID COUNTYFP NAMELSAD geometry HH_MED_INC HH_MED_INC_MOE HH_TOTAL HH_LT_10K HH_10_15K HH_15_20K ... HH_35_40K HH_40_45K HH_45_50K HH_50_60K HH_60_75K HH_75_100K HH_100_125K HH_125_150K HH_150_200K HH_GT_200K
0 49033 033 Rich County POLYGON ((1637136.748 7863034.219, 1637160.731... 76875 7743 817 15 23 32 ... 20 21 11 63 91 160 125 19 61 60
1 49005 005 Cache County POLYGON ((1459262.09 7896818.424, 1459327.308 ... 78292 2156 43118 1271 930 954 ... 1870 1727 1567 3248 4487 6638 5606 3464 3047 3667
2 49013 013 Duchesne County POLYGON ((1806607.907 7329761.438, 1806600.198... 74738 3589 6714 231 227 260 ... 200 299 211 524 640 940 883 539 491 491
3 49011 011 Davis County POLYGON ((1499977.044 7504652.389, 1498947.029... 108058 2051 114674 2575 1654 1829 ... 2565 2316 2696 6301 10134 16154 15542 12459 16209 18090
4 49003 003 Box Elder County POLYGON ((1466489.48 7656413.112, 1465439.601 ... 77865 3036 19150 869 676 365 ... 612 536 859 1321 2552 3147 2298 1580 1743 1184
5 49027 027 Millard County POLYGON ((1460149.259 6925110.92, 1460389.428 ... 70877 3293 4267 101 72 166 ... 178 193 257 255 688 750 400 297 311 161
6 49029 029 Morgan County POLYGON ((1536712.978 7595166.96, 1536762.027 ... 126092 8031 3605 38 19 22 ... 62 27 143 192 197 436 563 562 595 680
7 49057 057 Weber County POLYGON ((1509812.222 7673492.309, 1509866.401... 87083 1853 92353 3017 1835 1777 ... 3028 2664 3263 6514 9091 14891 11876 8651 9309 8897
8 49025 025 Kane County POLYGON ((1232395.237 6183379.547, 1232409.801... 75000 11394 3192 135 147 83 ... 109 115 211 251 237 407 497 242 244 206
9 49035 035 Salt Lake County POLYGON ((1574376.495 7361223.431, 1574337.618... 94658 1220 416589 12755 9578 8474 ... 12290 11803 13030 25578 37795 60857 49288 38966 49564 58280

10 rows × 23 columns

Show the code
# Preview the date in an interactive map
gdf_county_income.explore(column="HH_MED_INC", cmap="YlGnBu", legend=True)
Make this Notebook Trusted to load map: File -> Trust Notebook
Show the code
# Data Validity Check: Coefficient of Variation (CV) for Median Household Income
gdf_county_income['HH_MED_INC_CV'] = (
  gdf_county_income['HH_MED_INC_MOE'] / 1.645
) / gdf_county_income['HH_MED_INC'] * 100

gdf_county_income[['NAMELSAD', 'HH_MED_INC', 'HH_MED_INC_MOE', 'HH_MED_INC_CV']].sort_values(by='HH_MED_INC_CV', ascending=False)
NAMELSAD HH_MED_INC HH_MED_INC_MOE HH_MED_INC_CV
16 Daggett County 58750 17147 17.742482
13 Piute County 44650 7643 10.405824
17 Beaver County 85603 13902 9.872392
8 Kane County 75000 11394 9.235258
11 Garfield County 61688 8586 8.461051
27 Grand County 62521 8080 7.856327
20 Wayne County 70074 9056 7.856219
25 San Juan County 54890 6962 7.710364
0 Rich County 76875 7743 6.122915
21 Juab County 89803 8703 5.891315
26 Wasatch County 115146 8912 4.705009
15 Emery County 69956 5299 4.604718
22 Carbon County 53673 3677 4.164586
23 Iron County 65527 4321 4.008649
6 Morgan County 126092 8031 3.871829
2 Duchesne County 74738 3589 2.919215
18 Uintah County 69861 3315 2.884582
5 Millard County 70877 3293 2.824363
14 Summit County 137058 6366 2.823556
10 Sevier County 73765 3388 2.792075
28 Tooele County 101846 4513 2.693739
12 Washington County 76411 3253 2.587988
24 Sanpete County 67459 2817 2.538523
4 Box Elder County 77865 3036 2.370247
1 Cache County 78292 2156 1.674039
7 Weber County 87083 1853 1.293529
3 Davis County 108058 2051 1.153833
9 Salt Lake County 94658 1220 0.783496
19 Utah County 96877 1030 0.646325

ACS_5YR_BlockGroup

Show the code
# Fetch block group boundaries from TIGER/Line shapefiles
gdf_bg_bound = block_groups(
  state=STATE_FIPS,
  year=2023,
  cache=True
).to_crs(PROJECT_CRS)

# Fetch Income data from ACS 5-year estimates for block groups in Utah
df_bg_income = get_census(
  dataset="acs/acs5",
  year=2023,
  variables=list(acs_variables.keys()),
  params={
      # "key": f"{os.getenv('CENSUS_API_KEY')}", # FIXME: This causes error
      "for": f"block group:*",
      "in": f"state:{STATE_FIPS} county:*"
    },
    return_geoid=True,
    guess_dtypes=True
)

# Join ACS data to block group boundaries and transform CRS
gdf_bg_income = gdf_bg_bound[['GEOID', 'COUNTYFP', 'NAMELSAD', 'geometry']].merge(df_bg_income, on = "GEOID").to_crs(PROJECT_CRS)

# Rename columns
gdf_bg_income = gdf_bg_income.rename(columns=acs_variables)

# preview data
gdf_bg_income.head(10)
GEOID COUNTYFP NAMELSAD geometry HH_MED_INC HH_MED_INC_MOE HH_TOTAL HH_LT_10K HH_10_15K HH_15_20K ... HH_35_40K HH_40_45K HH_45_50K HH_50_60K HH_60_75K HH_75_100K HH_100_125K HH_125_150K HH_150_200K HH_GT_200K
0 490039603014 003 Block Group 4 POLYGON ((1457672.386 7795318.902, 1457692.203... 95313.0 47823.0 373 0 0 0 ... 16 13 17 23 61 59 45 32 66 33
1 490039603021 003 Block Group 1 POLYGON ((1454296.557 7798327.645, 1454324.964... 72872.0 20284.0 801 0 15 0 ... 22 12 117 0 146 227 54 0 36 12
2 490039603011 003 Block Group 1 POLYGON ((1458913.147 7790335.904, 1458913.726... 99519.0 29498.0 367 3 3 6 ... 2 21 11 12 36 84 37 39 52 53
3 490039603012 003 Block Group 2 POLYGON ((1457125.723 7786638.014, 1457132.453... 64970.0 16467.0 505 0 69 0 ... 0 0 15 23 158 64 69 39 0 0
4 490039603013 003 Block Group 3 POLYGON ((1457448.445 7792187.142, 1457460.447... 71507.0 23375.0 313 0 0 0 ... 20 0 0 0 113 0 81 17 0 18
5 490039603022 003 Block Group 2 POLYGON ((1447664.85 7792861.347, 1447700.369 ... 75333.0 34885.0 386 0 14 50 ... 0 14 0 37 76 54 9 73 39 20
6 490039603023 003 Block Group 3 POLYGON ((1440417.061 7797111.403, 1440454.858... 70511.0 14091.0 515 52 14 0 ... 53 0 22 65 173 73 24 0 0 39
7 490439643053 043 Block Group 3 POLYGON ((1611350.82 7412360.772, 1611358.374 ... 250001.0 NaN 127 0 0 0 ... 0 0 7 0 0 12 9 4 5 79
8 490039603024 003 Block Group 4 POLYGON ((1454390.221 7792981.289, 1454412.898... 56402.0 8388.0 444 0 20 10 ... 48 17 18 138 60 33 44 18 38 0
9 490439644021 043 Block Group 1 POLYGON ((1637685.712 7393688.213, 1637745.164... NaN NaN 148 0 0 0 ... 0 61 0 0 21 0 23 0 0 43

10 rows × 23 columns

Show the code
# Data Validity Check
gdf_bg_income[
    gdf_bg_income["HH_MED_INC"].isna() | (gdf_bg_income["HH_MED_INC"] >= 250000)
]
GEOID COUNTYFP NAMELSAD geometry HH_MED_INC HH_MED_INC_MOE HH_TOTAL HH_LT_10K HH_10_15K HH_15_20K ... HH_35_40K HH_40_45K HH_45_50K HH_50_60K HH_60_75K HH_75_100K HH_100_125K HH_125_150K HH_150_200K HH_GT_200K
7 490439643053 043 Block Group 3 POLYGON ((1611350.82 7412360.772, 1611358.374 ... 250001.0 NaN 127 0 0 0 ... 0 0 7 0 0 12 9 4 5 79
9 490439644021 043 Block Group 1 POLYGON ((1637685.712 7393688.213, 1637745.164... NaN NaN 148 0 0 0 ... 0 61 0 0 21 0 23 0 0 43
11 490439644023 043 Block Group 3 POLYGON ((1644362.851 7410618.596, 1644376.438... NaN NaN 212 10 0 0 ... 12 0 13 68 0 0 0 25 32 28
17 490111257022 011 Block Group 2 POLYGON ((1495452.696 7576157.753, 1495455.973... NaN NaN 624 96 103 50 ... 20 0 8 29 113 109 25 9 20 0
36 490451308001 045 Block Group 1 POLYGON ((1374205.571 7390229.263, 1374207.902... NaN NaN 294 0 20 0 ... 60 0 0 0 27 56 29 0 0 91
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1927 490490022091 049 Block Group 1 POLYGON ((1568307.053 7287618.341, 1569785.588... NaN NaN 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1964 490499806001 049 Block Group 1 POLYGON ((1573655 7254082.838, 1573794.857 725... NaN NaN 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1981 490490014032 049 Block Group 2 POLYGON ((1592314.617 7262270.71, 1592370.524 ... NaN NaN 46 0 12 17 ... 0 0 0 0 7 10 0 0 0 0
1985 490490017013 049 Block Group 3 POLYGON ((1600460.288 7259733.736, 1600492.384... NaN NaN 89 0 0 0 ... 0 0 0 0 0 23 0 9 18 1
1989 490490027024 049 Block Group 4 POLYGON ((1605212.125 7249518.913, 1605235.656... NaN NaN 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

73 rows × 23 columns

Chart County Median Income - Scaled by Number of Households

Show the code
# Create plotting dataframe with selected columns
df_plot = gdf_county_income[['NAMELSAD', 'HH_MED_INC', 'HH_TOTAL']].copy()

# Rank counties by median household income (highest = rank 1)
df_plot["Rank"] = df_plot["HH_MED_INC"].rank(
    method="min",       # assign smallest rank for ties
    ascending=False     # highest income gets rank 1
).astype(int)

# Create Region column based on MPO counties
mpo_counties = ['Summit County', 'Wasatch County', 'Davis County', 'Salt Lake County',
                'Utah County', 'Weber County', 'Washington County', 'Cache County', 'Iron County']

df_plot['Region'] = df_plot['NAMELSAD'].apply(lambda x: 'MPO' if x in mpo_counties else 'Non-MPO')

df_plot.sort_values(by="Rank")
NAMELSAD HH_MED_INC HH_TOTAL Rank Region
14 Summit County 137058 14477 1 MPO
6 Morgan County 126092 3605 2 Non-MPO
26 Wasatch County 115146 11985 3 MPO
3 Davis County 108058 114674 4 MPO
28 Tooele County 101846 23050 5 Non-MPO
19 Utah County 96877 195602 6 MPO
9 Salt Lake County 94658 416589 7 MPO
21 Juab County 89803 3756 8 Non-MPO
7 Weber County 87083 92353 9 MPO
17 Beaver County 85603 2386 10 Non-MPO
1 Cache County 78292 43118 11 MPO
4 Box Elder County 77865 19150 12 Non-MPO
0 Rich County 76875 817 13 Non-MPO
12 Washington County 76411 66914 14 MPO
8 Kane County 75000 3192 15 Non-MPO
2 Duchesne County 74738 6714 16 Non-MPO
10 Sevier County 73765 7393 17 Non-MPO
5 Millard County 70877 4267 18 Non-MPO
20 Wayne County 70074 1085 19 Non-MPO
15 Emery County 69956 3428 20 Non-MPO
18 Uintah County 69861 11940 21 Non-MPO
24 Sanpete County 67459 8748 22 Non-MPO
23 Iron County 65527 19658 23 MPO
27 Grand County 62521 4466 24 Non-MPO
11 Garfield County 61688 2027 25 Non-MPO
16 Daggett County 58750 262 26 Non-MPO
25 San Juan County 54890 4650 27 Non-MPO
22 Carbon County 53673 8023 28 Non-MPO
13 Piute County 44650 567 29 Non-MPO
Show the code
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
from adjustText import adjust_text

# Define color mapping
color_map = {'MPO': '#843C0C', 'Non-MPO': '#4472C4'}

# Set the style
sns.set_style("whitegrid")

# Create figure
plt.figure(figsize=(10, 6.5))

# Create scatter plot with hue for automatic color separation
sns.scatterplot(
    data=df_plot,
    x='Rank',
    y='HH_MED_INC',
    size='HH_TOTAL',
    sizes=(100, 10000),
    hue='Region',
    palette=color_map,
    alpha=0.3,
    edgecolor=df_plot['Region'].map(color_map),
    linewidth=2.5,
    legend=False
)

# Add smooth trend line
sns.regplot(
    data=df_plot,
    x='Rank',
    y='HH_MED_INC',
    scatter=False,
    order=3,
    ci=None,
    color='red',
    line_kws={'linestyle':':', 'dashes':(1, 1), 'linewidth':2, 'alpha':0.8}
)

# Create list to store text objects
texts = []

# Add labels for each county with color based on Region
for idx, row in df_plot.iterrows():
    county_name = row['NAMELSAD'].replace(' County', '').upper()
    income_label = f"${row['HH_MED_INC']:,.0f}"
    label = f"{county_name}\n{income_label}"
    label_color = color_map[row['Region']]

    text = plt.text(
        row['Rank'],
        row['HH_MED_INC'],
        label,
        fontsize=7,
        ha='center',
        va='center',
        fontweight='bold',
        color=label_color
    )
    texts.append(text)

# Adjust text positions to avoid overlap
adjust_text(
    texts,
    arrowprops=dict(arrowstyle='->', color='gray', lw=0.5, alpha=0.6),
    # expand_points=(100, 100),  # Increased from (1.5, 1.5)
    # expand_text=(100, 100),    # Added to expand text bounding boxes
    # force_text=(100, 100),     # Increased from (0.5, 0.5)
    # force_points=(100, 100),   # Increased from (0.3, 0.3)
    lim=500                    # Increased iteration limit for better placement
)

# Add horizontal line at median income
state_avg = gdf_ut_income['HH_MED_INC'].values[0]
plt.axhline(y=state_avg, color='#C00000', linestyle='--', linewidth=2, alpha=0.4)

# Add text annotations for state average
plt.text(29.5, state_avg + 1500, f'Utah State Average: ${state_avg:,.0f}', fontsize=10, color='#C00000',
         ha='right', va='bottom', fontweight='bold')

# Set axis limits and ticks
plt.xlim(0, 30)
plt.ylim(0, 150000)
plt.xticks(range(0, 31, 5))

# Format y-axis
plt.gca().yaxis.set_major_locator(ticker.MultipleLocator(20000))
plt.gca().yaxis.set_minor_locator(ticker.MultipleLocator(10000))
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, p: f'${x:,.0f}'))

# Customize grid
plt.grid(True, which='major', axis='y', alpha=0.4, linewidth=0.8, color='gray')
plt.grid(True, which='minor', axis='y', alpha=0.2, linewidth=0.5, color='gray')

# Set labels and title
plt.xlabel('Rank', fontsize=12, fontweight='bold', color='#595959')
plt.ylabel('Median Household Income', fontsize=12, fontweight='bold', color='#595959')
plt.title('Median Income in 2023 Dollars', fontsize=16, fontweight='bold', pad=20, color='#595959')
plt.gca().text(0.5, 1.02, 'Source: 2019-2023 American Community Survey',
               transform=plt.gca().transAxes, ha='center', fontsize=10.5,
               color='#595959', style='italic')

plt.tight_layout()
plt.show()
Looks like you are using a tranform that doesn't support FancyArrowPatch, using ax.annotate instead. The arrows might strike through texts. Increasing shrinkA in arrowprops might help.

BlockSplit_wCOTAZID

Show the code
# Define a function for population weighted interpolation
def interpolate_pw(from_gdf, to_gdf, weights_gdf, to_id=None, extensive=True,
                   weight_column=None, weight_placement='surface', crs=None):
    """
    Population-weighted areal interpolation between geometries.

    Transfers numeric data from source geometries to target geometries using
    population-weighted interpolation based on point weights (e.g., census blocks).

    Parameters:
    -----------
    from_gdf : GeoDataFrame
        Source geometries with numeric data to interpolate
    to_gdf : GeoDataFrame
        Target geometries to interpolate data to
    weights_gdf : GeoDataFrame
        Weight geometries (e.g., census blocks) used for interpolation.
        If polygons, will be converted to points. Can be the same as to_gdf.
    to_id : str, optional
        Column name for unique identifier in target geometries.
        If None, creates an 'id' column.
    extensive : bool, default True
        If True, return weighted sums (for counts).
        If False, return weighted means (for rates/percentages).
    weight_column : str, optional
        Column name in weights_gdf for weighting (e.g., 'POP', 'HH').
        If None, all weights are equal.
    weight_placement : str, default 'surface'
        How to convert polygons to points: 'surface' or 'centroid'
    crs : str or CRS object, optional
        Coordinate reference system to project all datasets to

    Returns:
    --------
    GeoDataFrame
        Target geometries with interpolated numeric values
    """
    import warnings
    import geopandas as gpd
    import pandas as pd

    # Input validation
    if not all(isinstance(gdf, gpd.GeoDataFrame) for gdf in [from_gdf, to_gdf, weights_gdf]):
        raise ValueError("All inputs must be GeoDataFrames")

    # Make copies to avoid modifying originals
    from_gdf = from_gdf.copy()
    to_gdf = to_gdf.copy()
    weights_gdf = weights_gdf.copy()

    # Set CRS if provided
    if crs:
        from_gdf = from_gdf.to_crs(crs)
        to_gdf = to_gdf.to_crs(crs)
        weights_gdf = weights_gdf.to_crs(crs)

    # Check CRS consistency
    if not (from_gdf.crs == to_gdf.crs == weights_gdf.crs):
        raise ValueError("All inputs must have the same CRS")

    # Handle to_id
    if to_id is None:
        to_id = 'id'
        to_gdf[to_id] = to_gdf.index.astype(str)

    # Remove conflicting columns
    if to_id in from_gdf.columns:
        from_gdf = from_gdf.drop(columns=[to_id])

    # Create unique from_id
    from_id = 'from_id'
    from_gdf[from_id] = from_gdf.index.astype(str)

    # Handle weight column
    if weight_column is None:
        weight_column = 'interpolation_weight'
        weights_gdf[weight_column] = 1.0
    else:
        # Rename to avoid conflicts
        weights_gdf['interpolation_weight'] = weights_gdf[weight_column]
        weight_column = 'interpolation_weight'

    # Convert weights to points if needed
    if weights_gdf.geometry.geom_type.iloc[0] in ['Polygon', 'MultiPolygon']:
        if weight_placement == 'surface':
            weights_gdf = weights_gdf.copy()
            weights_gdf.geometry = weights_gdf.geometry.representative_point()
        elif weight_placement == 'centroid':
            weights_gdf = weights_gdf.copy()
            weights_gdf.geometry = weights_gdf.geometry.centroid
        else:
            raise ValueError("weight_placement must be 'surface' or 'centroid'")

    # Keep only weight column and geometry
    weight_points = weights_gdf[[weight_column, 'geometry']].copy()

    # Calculate denominators (total weights per source geometry)
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore', category=UserWarning)
        source_weights = gpd.sjoin(from_gdf, weight_points, how='left', predicate='contains')

    denominators = (source_weights.groupby(from_id)[weight_column]
                   .sum()
                   .reset_index()
                   .rename(columns={weight_column: 'weight_total'}))

    # Calculate intersections between from and to
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore', category=UserWarning)
        intersections = gpd.overlay(from_gdf, to_gdf, how='intersection')

    # Filter to keep only polygon intersections
    intersections = intersections[intersections.geometry.geom_type.isin(['Polygon', 'MultiPolygon', 'GeometryCollection'])]

    if len(intersections) == 0:
        raise ValueError("No valid polygon intersections found between source and target geometries")

    # Add intersection ID
    intersections['intersection_id'] = range(len(intersections))

    # Spatial join intersections with weight points to get weights within each intersection
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore', category=UserWarning)
        intersection_weights = gpd.sjoin(intersections, weight_points, how='left', predicate='contains')

    # Calculate intersection values (sum of weights per intersection)
    intersection_values = (intersection_weights.groupby('intersection_id')[weight_column]
                         .sum()
                         .reset_index()
                         .rename(columns={weight_column: 'intersection_value'}))

    # Merge back to intersections and keep only unique intersections
    intersections = intersections.merge(intersection_values, on='intersection_id', how='left')
    intersections['intersection_value'] = intersections['intersection_value'].fillna(0)

    # Remove duplicates created by the spatial join
    intersections = intersections.drop_duplicates(subset='intersection_id')

    # Merge with denominators to calculate weight coefficients
    intersections = intersections.merge(denominators, on=from_id, how='left')
    intersections['weight_total'] = intersections['weight_total'].fillna(1)

    # Calculate weight coefficients (intersection weight / total weight in source)
    intersections.loc[intersections['weight_total'] > 0, 'weight_coef'] = (
        intersections['intersection_value'] / intersections['weight_total']
    )
    intersections['weight_coef'] = intersections['weight_coef'].fillna(0)

    # Get numeric columns from source data
    numeric_cols = from_gdf.select_dtypes(include=[np.number]).columns
    # Remove ID columns
    numeric_cols = [col for col in numeric_cols if col not in [from_id]]

    # Prepare intersection data for interpolation
    intersection_data = intersections[[from_id, to_id, 'weight_coef'] + numeric_cols].copy()

    if extensive:
        # For extensive variables: multiply by weight coefficient, then sum by target
        for col in numeric_cols:
            intersection_data[col] = intersection_data[col] * intersection_data['weight_coef']

        interpolated = (intersection_data.groupby(to_id)[numeric_cols]
                       .sum()
                       .reset_index())
    else:
        # For intensive variables: weighted average
        interpolated_data = []
        for target_id in intersection_data[to_id].unique():
            target_data = intersection_data[intersection_data[to_id] == target_id]
            if len(target_data) > 0 and target_data['weight_coef'].sum() > 0:
                weighted_vals = {}
                for col in numeric_cols:
                    weighted_vals[col] = (target_data[col] * target_data['weight_coef']).sum() / target_data['weight_coef'].sum()
                weighted_vals[to_id] = target_id
                interpolated_data.append(weighted_vals)

        interpolated = pd.DataFrame(interpolated_data)

    # Merge with target geometries
    result = to_gdf[[to_id, 'geometry']].merge(interpolated, on=to_id, how='left')

    # Fill NaN values with 0 for missing interpolations
    for col in numeric_cols:
        if col in result.columns:
            result[col] = result[col].fillna(0)

    return result

3 Lookup Tables

ACS Column ID to Label

Show the code
lookup_hhinc = pd.DataFrame({
  "Income Category": [
    "HH_LT_10K", "HH_10_15K", "HH_15_20K", "HH_20_25K", "HH_25_30K", "HH_30_35K",
    "HH_35_40K", "HH_40_45K", "HH_45_50K", "HH_50_60K", "HH_60_75K",
    "HH_75_100K", "HH_100_125K", "HH_125_150K", "HH_150_200K", "HH_GT_200K"
  ],
  "Lower Limit": [
    0, 10000, 15000, 20000, 25000, 30000,
    35000, 40000, 45000, 50000, 60000,
    75000, 100000, 125000, 150000, 200000
  ],
  "Upper Limit": [
    9999, 14999, 19999, 24999, 29999, 34999,
    39999, 44999, 49999, 59999, 74999,
    99999, 124999, 149999, 199999, np.inf
  ]
})

# Compute midpoint and round it
lookup_hhinc['Midpoint'] = (
  (lookup_hhinc['Lower Limit'] + lookup_hhinc['Upper Limit']) / 2
).round()

# Replace infinite midpoint (last category) with 300000
lookup_hhinc.loc[np.isinf(lookup_hhinc["Upper Limit"]), "Midpoint"] = 300000

lookup_hhinc
Income Category Lower Limit Upper Limit Midpoint
0 HH_LT_10K 0 9999.0 5000.0
1 HH_10_15K 10000 14999.0 12500.0
2 HH_15_20K 15000 19999.0 17500.0
3 HH_20_25K 20000 24999.0 22500.0
4 HH_25_30K 25000 29999.0 27500.0
5 HH_30_35K 30000 34999.0 32500.0
6 HH_35_40K 35000 39999.0 37500.0
7 HH_40_45K 40000 44999.0 42500.0
8 HH_45_50K 45000 49999.0 47500.0
9 HH_50_60K 50000 59999.0 55000.0
10 HH_60_75K 60000 74999.0 67500.0
11 HH_75_100K 75000 99999.0 87500.0
12 HH_100_125K 100000 124999.0 112500.0
13 HH_125_150K 125000 149999.0 137500.0
14 HH_150_200K 150000 199999.0 175000.0
15 HH_GT_200K 200000 inf 300000.0

BlockGroupID for TAZ Centroid (Use if No HH in TAZ)

4 Processing

Process_ACS_BG

Process_BlockSplit

Calc_TAZ_MedInc

5 Export_MedInc

TAZ Median Income (SE Files)

Show the code
gdf_TAZ[['CO_TAZID', 'SUBAREAID', 'CO_FIPS', 'CO_NAME']]
CO_TAZID SUBAREAID CO_FIPS CO_NAME
0 30001 1 3 BOX ELDER
1 30002 1 3 BOX ELDER
2 30003 1 3 BOX ELDER
3 30004 1 3 BOX ELDER
4 30005 1 3 BOX ELDER
... ... ... ... ...
3557 490911 1 49 UTAH
3558 491028 1 49 UTAH
3559 490153 1 49 UTAH
3560 490157 1 49 UTAH
3561 490719 1 49 UTAH

3562 rows × 4 columns

Regional Median Income (General Parameters)